How Neural Networks (NN) Can (Hopefully) Learn Faster by Taking Into Account Known Constraints

نویسندگان

  • Chitta Baral
  • Martine Ceberio
  • Vladik Kreinovich
چکیده

Neural networks are a very successful machine learning technique. At present, deep (multi-layer) neural networks are the most successful among the known machine learning techniques. However, they still have some limitations, One of their main limitations is that their learning process still too slow. The major reason why learning in neural networks is slow is that neural networks are currently unable to take prior knowledge into account. As a result, they simple ignore this knowledge and simulate learning “from scratch”. In this paper, we show how neural networks can take prior knowledge into account and thus, hopefully, learn faster. 1 Formulation of the Problem Need for machine learning. In many practical situations, we know that the quantities y1, . . . , yL depend on the quantities x1, . . . , xn, but we do not know the exact formula for this dependence. To get this formula, we measure the values of all these quantities in different situations m = 1, . . . ,M , and then use the corresponding measurement results x (m) i and y (m) l to find the corresponding dependence. Algorithms that “learn” the dependence from the measurement results are known as machine learning algorithms. Neural networks (NN): main idea and successes. One of the most widely used machine learning techniques is the technique of neural networks (NN) – which is based on a (simplified) simulation of how actual neurons works in the human brain (a brief technical description of this technique is given in Section 2). This technique has many useful applications; see, e.g., [1, 2]. At present (2016) multi-layer (“deep”) neural networks are, empirically, the most efficient of the known machine learning techniques. Neural networks: limitations. One of the main limitations of neural networks is that their learning very slow: they need many thousand iterations just to learn a simple dependence. 2 C. Baral, M. Ceberio, and V. Kreinovich This slowness is easy to explain: the current neural networks always start “from scratch”, from zero knowledge. In terms of simulating human brain, they do not simulate how we learn the corresponding dependence – they simulate how a newborn child will eventually learn to recognize this dependence. Of course, this inability to take any prior knowledge into account drastically slows down the learning process. What is prior knowledge. Prior knowledge means that we know some relations (“constraints”) between the desired values y1, . . . , yL and the observed values x1, . . . , xn, i.e., we know several relations of the type fc(x1, . . . , xn, y1, . . . , yL) = 0, 1 ≤ c ≤ C. Prior knowledge helps humans learn faster. Prior knowledge helps us learn. Yes, it takes some time to learn this prior knowledge, but this has been done before we have samples of xi and yl. As a result, the time from gathering the samples to generating the desired dependence decreases. This is not simply a matter of accounting: the same prior knowledge can be used (and usually is used) in learning several different dependencies. For example, our knowledge of sines, logarithms, of calculus helps in finding the proper dependence in many different situations. So, when we learn the prior knowledge first, we decrease the overall time needed to learn all these dependencies. How to speed up artificial neural networks: a natural idea. In view of the above explanation, a natural idea is to enable neural networks to take prior knowledge into account. In other words, instead of learning all the data “from scratch”, we should first learn the constraints. Then, when it is time to use the data, we should be able to use these constraints to “guide” the neural network in the right direction. What we do in this paper. In this paper, we show how to implement this idea and thus, how to (hopefully) achieve the corresponding speed-up. To describe this idea, we first, in Section 2, recall how the usual NN works. Then, in Section 3, we show how we can perform a preliminary training of a NN, so that it can learn to satisfy the given constraints. Finally, in Section 4, we show how to train the resulting pre-trained NN in such a way that the constraints remain satisfied. 2 Neural Networks: A Brief Reminder Signals in a biological neural network. In a biological neural network, a signal is represented by a sequence of spikes. All these spikes are largely the same, what is different is how frequently the spikes come. Several sensor cells generate such sequences: e.g., there are cells that translate the optical signal into spikes, there are cells that translate the acoustic signal into spikes. For all such cells, the more intense the original physical signal, the more spikes per unit time it generates. Thus, the frequency of the spikes can serve as a measure of the strength of the original signal. How Neural Networks Can Use Constraints 3 From this viewpoint, at each point in a biological neural network, at each moment of time, the signal can be described by a single number: namely, by the frequency of the corresponding spikes. What is a biological neuron: a brief description. A biological neuron has several inputs and one output. Usually, spikes from different inputs simply get together – probably after some filtering. Filtering means that we suppress a certain proportion of spikes. If we start with an input signal xi, then, after such a filtering, we get a decreased signal wi · xi. Once all the inputs signals are combined, we have the resulting signal n ∑ i=1 wi · xi. A biological neuron usually has some excitation level w0, so that if the overall input signal is below w0, there is practically no output. The intensity of the output signal thus depends on the difference d def = n ∑ i=1 wi · xi −w0. Some neurons are linear, their output is proportional to this difference. Other neurons are non-linear, they output is equal to s0(d) for some non-linear function s0(z). Empirically, it was found that the corresponding non-linear transformation takes the form s0(z) = 1/(1 + exp(−z)). Comment. It should be mentioned that this is a simplified description of a biological neuron: the actual neuron is a complex dynamical system, in the sense that its output at a given moment of time depends not only on the current inputs, but also on the previous input values. Artificial neural networks and how they learn. If we need to predict the values of several outputs y1, . . . , yl, . . . , yL, then for each output yl, we train a separate neural network. In an artificial neural networks, input signals x1, . . . , xn first go to the neurons of the first layer, then the results go to neurons of the second layer, etc. In the simplest (and most widely used) arrangement, the second layer has linear neurons. In this arrangement, the neurons from the first layer produce the signals yl,k = s0 ( n ∑ i=1 wl,ki · xi − wl,k0 ) , 1 ≤ k ≤ Kl, which are then combined into an output yl = K ∑ k=1 Wl,k · yk −Wl,0. This is called forward propagation. (In this paper, we will only describe formulas for this arrangement, since formulas for the multi-layer neural networks can be obtained by using the same idea.) How a NN learns: derivation of the formulas. Once we have an observation (x (m) 1 , . . . , x (m) n , y (m) l ), we first input the values x (m) 1 , . . . , x (m) n into the current NN; the network generates some output yl,NN . In general, this output is different from the observed output y (m) l . We therefore want to modify the weights Wl,k and wl,ki so as to minimize the squared difference J def = (∆yl) , where ∆yl def = yl,NN − y l . This minimization is done by using gradient descent, where each of the unknown values is updated as Wl,k → Wl,k − λ · ∂J ∂Wl,k and wl,ki → 4 C. Baral, M. Ceberio, and V. Kreinovich wl,ki − λ · ∂J ∂wl,ki . The resulting algorithm for updating the weights is known as backpropagation. This algorithm is based on the following idea. First, one can easily check that ∂J ∂Wl,0 = −2∆y, so ∆Wl,0 = −λ · ∂J ∂Wl,0 = α ·∆yl, where α def = 2λ. Similarly, ∂J ∂Wl,k = 2∆yl ·yl,k, so ∆Wl,k = −λ · ∂J ∂Wl,k = 2λ ·∆yl · yl,k, i.e., ∆Wl,k = −∆Wl,0 · yl,k. The only dependence of yl on wl,ki is via the dependence of yl,k on wl,ki. So, for wl,k0, we can use the chain rule and get ∂J ∂wl,k0 = ∂J ∂yl,k · ∂yl,k ∂wl,k0 , hence: ∂J ∂wl,k0 = 2∆yl ·Wl,k · s0 ( n ∑ i=1 wl,ki · xi − wl,k0 ) · (−1). For s0(z) = 1/(1 + exp(−z)), we have s0(z) = exp(−z)/(1 + exp(−z)), i.e., s0(z) = exp(−z) 1 + exp(−z) · 1 1 + exp(−z) = s0(z) · (1− s0(z)). Thus, in the above formula, where s0(z) = yl,k, we get s ′ 0(z) = yl,k · (1 − yl,k), ∂J ∂wl,k0 = −2∆yl ·Wl,k · yl,k · (1− yl,k), and ∆wl,k0 = −λ · ∂J ∂wl,k0 = λ · 2∆yl ·Wl,k · yl,k · (1− yl,k). So, we have ∆wl,k0 = −∆Wl,k ·Wl,k · (1− yl,k). For wl,ki, we have ∂J ∂wl,ki = 2∆yl ·Wl,k · yl,k · (1− yl,k) · xi = − ∂J ∂wl,k0 · xi, hence ∆wl,ki = −xi ·∆wl,k0. Thus, we arrive at the following algorithm: Resulting algorithm. We pick some value α, and cycle through observations (x1, . . . , xn) with the desired outputs yl. For each observation, we first apply the forward propagation to compute the network’s prediction yl,NN , then we compute ∆yl = yl,NN − yl, ∆Wl,0 = α ·∆yl, ∆Wl,k = −∆Wl,0 · yl,k, ∆wl,k0 = −∆Wl,k ·Wl,k · (1 − yl,k), and ∆wl,ki = −∆wl,k0 · xi, and update each weight w to wnew = w +∆w. We repeat this procedure until the process converges. 3 How to Pre-Train a NN to Satisfies Given Constraints To train the network, we can use any observations (x (m) 1 , . . . , x (m) n , y (m) 1 , . . . , y (m) L ) that satisfy all the known constraints. To satisfy the constraints fc(x1, . . . , xn, y1, . . . , yL) = 0, 1 ≤ c ≤ C, means to minimize the distance from the vector of values (f1, . . . , fC) to How Neural Networks Can Use Constraints 5 the ideal point (0, . . . , 0), i.e., equivalently, to minimize the sum F def = C ∑ c=1 (fc(x1, . . . , xn, y1, . . . , yL)) . To minimize this sum, we can use a similar gradient descent idea. From the mathematical viewpoint, the only difference from the usual backpropagation is the first step: here,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficiency of Neural Networks for Estimating the Patch Load Resistance of Plate Girders with a Focus on Uncertainties in Material and Geometrical Properties

In this paper, a sensitivity analysis of artificial neural networks (NNs) is presented and employed for estimating the patch load resistance of plate girders subjected to patch loading. To evaluate the accuracy of the proposed NN model, the results are compared with the previously proposed empirical models, so that we can estimate the resistance of plate girders subjected to patch loading. The ...

متن کامل

Application of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data

This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values.  Seismic surveying was performed next on these models. F...

متن کامل

Toward Autonomic Computing: Adaptive Neural Network for Trajectory Planning

Cognitive approach through the neural network “NN” paradigm is a critical discipline that will help bring about autonomic computing “AC.” NN-related research, some involving new ways to apply control theory and control laws, can provide insight into how to run complex systems that optimize to their environments. NN is one kind of AC system that can embody human cognitive powers and that can ada...

متن کامل

Neural Networks Learning Differential Data

In many of machine learning problems, it is essential to use not only the training data, but also a priori knowledge about how the world is constrained. In many cases, such knowledge is given in the forms of constraints on differential data or more specifically partial differential equations (PDEs). Neural networks with capabilities to learn differential data can take advantage of such knowledg...

متن کامل

Considering Adequateness in Neural Network Learning

We propose a new learning strategy to consider aspects of cogni-tive adequateness during the training of artiicial neural networks instead of merely taking the overall error into account. Well known learning algorithms for neural networks can be adapted in a way that leads to an adequate behaviour by using a fuzzy system to provide pattern speciic learning rates based on a predetermined measure...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016